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APPEHDIX-X 


Encoding Relations Between index Sntriea 


Su mmary 

In many searches, no ambiguity or uncertain! ty can arise when 
the scope of search is defined by a combination of entities, process- 
es and attributes corresponding to various index entries, This is 
the case, when the very nature of the entities, processes, etc, 
permits little nr no uncertainty as to their interrelationship. 

Oranges are imported to Boston from California and never to California 
from Boston. In general, however, a considerable degree ef uncertain- 
ty is possible or even probable. 

This appendix is concerned with various ways in which relation- 
ships stated in a document can be encoded for machine searching. 

One possibility is to attach significance to the order of citat- 
ion of the index entries but this involves undue complications with 
machines available at present or likely to be constructed in the 
near future. The most practical approach is to establish a system 
of role indicators. In practice these would be symbols that would 
be attached to index entries for the purpose of resolving uncertainty 
as to the relationships existing between them. In order to keep the 
coding system as simple as possible, it would probably be best to 
employ a small number of role indicators each of which has a broad 
general significance open to argument position in formal logic, Eor 
certain important roles, e,g Jt "raw material", "product” ," condition- 
ing agency", it may be worthwhile to set up special role indicators 
having specific significance. 

Introduction 

As we have seen, the basic step in indexing any given document 
is to decide which objects, persons, processes, attributes, locations, 
etc,, referred to by the document, are of interest in selecting and 
correlating the information contained in the document. As a conseq- 
uence of this, policy decisions as to how indexing can be accomplished 
most advantageously, must take into account the purpose or purposes 
that a file of documents must serve, Previous discussion has also 
pointed out, that the effectiveness of presently available automatic 
equipment is greatly Increased by appropriately encoding index entries. 
In particular, it is highly advantageous to employ a coding system 
which is so constructed that both specific and generic terminology 
can be used to define and conduct searching and correlating oper- 
ations. 
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Construction of an effective code requires that a large mass of 
terminology he appropriately processed. Appendix IX presents proce- 
dures and techniques that we have developed so as to expedite the 
analyzing and encoding of scientific and technical terms e 

Once the appropriate index entries have "been set up and encoded, 
their punching in cards can then provide a file searchable by machine. 

As already noted, it is an essential feature of the new indexing 
system that all the entries pertaining to any one document— or, more 
precisely, to any one unit of information - are punched one after an- 
other in a single card, or a sequence of cards which acts as a unit as 
far as machine searching and selecting operations are concerned. This 
makes it possible to direct a search to any one index entry or to var- 
ious combinations of the same. 

Juxtaposition of index entries in a single card— or sequence of 
cards acting as a unit - indicates in our system that the various 
entries pertain to some one document (or unit of information). This 
simple relationship of belonging together ruay suffice to avoid ambiguity 
and uncertainty in defining and conducting many searches. Thus if we 
direct a search to the combination of entries ’’fire 11 "gasoline" "ext- 
inguish" "foam" we will probably select documents pertaining to the 
use of a foam to extinguish a gasoline fire, but conceivably we might 
also locate items concerned with extinguishing a fire involving gas- 
oline in foam form. Such items would, probably be few in number as 
gasoline in foam form is rarely produced, even experimentally. 

All possible ambiguity can be eliminated in simple cases by setting 
tl$ I /‘Eppr opriat e convention. Thus, for example, we might establish 
the following simple convention to indicate that a given compound has 
certain physical properties. The convention would be tp punch as a block 
of entries on a single card (or group of cards acting as a unit) 
first, the encoded representation of a compound 1 s molecular struct., 
ure followed by data as to its melting point, boiling point, refrac- 
tive inddx, density, etc. If one of the properties were the solub- 
ility of some substance liquid at ordinary temperatures, - for 
example, ether - then additional conventions may be neoessary to 
avoid ambiguity as to whether the solubility recorded by punching 
on our card refers to our liquid substance dissolved in water or 
water dissolved in our liquid substance. Such conventional methods 
of punching are relatively easy to set up as long as the possibility 
of ambiguity is restricted to solubility. 


Solubility is only one of many relations in which ambiguity may 
be involved. Other examples might be the starting point and destin- 
ation of a trip, temporal sequence of two or more events, differences 

in properties between substances such as "B is harder than A" . This 
list of possibilities is capable of almost indefinite expansion, 
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Such relations in which there l's maJ Criori cer' taint y that A 
stands in relation E to B rather than B standing In relation to 
we shall term asymmetric relations. 

The relation involved in many interactions may be ^ynmetric 
in nature. We have already mentioned the example of A dissolves 
and its counterpart B dissolves A. An example from internatio^^ 
relations might "be A attacks B and B attacks A, In the ' 

Sefs rj^t have Company A is a subsidiary of «» 

possible alternative Company B is the subsidiary P 

In considering means for resolving antigens relations, it is help- 
ful to use s ainpX notation. Thus, In the examples given **° n ’*l 
we used the letters A and B to refer to pairs of substances, count- 
ries and conoanies. If now we use H to indicate any one of the re- 
S^fr^red to wo night reduce our 
relations involving ( 1 ) soxubilxty, x> att-.-ck^ 6>. 
iary status to a single pair of generalized relations. 


and 


A R ! s B 
B R' s A 


In contemplating the problem of asymmetric reia.xon. from th 
viewpoint of machine searching methods, one very or t ant consider 
ation relates to various means for so expressing relationship^ 

actually existing in a, given instance so that tae marching and 
cognizance of that existing relationship when conducting searching 

selecting operations, 

Perhaps the simple** « though not necessarily f °, "^^^efanlov- 
usable - device for resolving asymmetric relations is the ^ pl^ 
ed very extensively by the English language, ns _ 
lish the convention that when A is in the relat 3n * , Thus 

fact is indicated by the order of citation of the three symbols. Thus, 
for our example, we might stipulate that A shall be ci.ed first foll- 
owed by R and B in that order. 

It might be observed that, as a matter of logic, or so fv»r as 
machine searching is concerned there is no reason, except perhaps 
similarity to the familiar practices of tho English language, why ny 
oas sequence of the three symbols should bo given preference in estab- 
lishing 1 the convention selected to indicate that A ie in rod ation » 
to B Instead of A B B, wo might with equal logic select B or 

A B B or B A R or E A B or B B A. Hor is it necessary that the in 
verse order of symbols be used to designate an inverse relationship. 
Sue we night usf A B E to indicate that A is in relation B to B 
Se usSi B A B to indicate that B is in relation B to A. 
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What is ;alQ. important is that selections must he made and adhered 
to without exception, if order of citatihn of the symbols is to he 
used effectively to resolve ambiguity when dealing with asymmetric 
relations, For any given relation R, which is asymmetric, it would he 
entirely possible for the code dictionary to specify which order of 
symbols has been selected to designate that one entity is in relation 
R to another. 

Whether this method is used or not depends, of course, on pract- 
ical considerations. One of those will be the amount of difficulty 
experienced by the indexer and the encoder in establishing the proper 
sequence of symbols to indicate the relationship which is specified 
in the document being analyzed* Another consideration is the ability 
of the searching machine to discriminate one sequence of symbols from 
another, As already noted, machines now being developed fcsye the 
needed discriminating ability and the decision as to whether order of 
citation of symbols will be used to resolve asymmetric ambiguity must 
be based on other considerations. 

If we were never concerned with relations more complex than those 
involving three elements, as exemplified by A R*s B, then adopting 
an ordering convention would perhaps afford the simplest and most 
advantageous solution of the problem* However, there are asymmetric 
relations of higher orders cf complexity. For example, we may have 
relations involving four elements such as (i) A gives B to 0 or (u) 

B and C interact to produce A (as in plant hybridizing or in chemical 
relations). Such relations may be linked together as in barter trad- 
ing with A giving B to 0 and C giving I to A, Or, to give another 
example, B and 0 may interact to produce A and D, To cope with such 
relations by an ordering convention involves either (l) much more 
intricate rules than required for the simjbie three-element relation 
A R's B or (2) the more complex relations must be broken up into 
smaller units. Thus we might establish the order A R B C to indicate 
A gives B to C (with R indicating the giving relation). Alternately, 
we might express this relation as the sum of A gives B and C receives 
B, This alternative involves a repetition of the symbol 0 and also 
double encoding of the relation (gives, receives) involved. In a 
more complex situationy the degree of repetition becomes so great as 
to raise grave questions as t;o the practicality of this approach , 

In considering the advisability of relying on order as the sole 
means for indicating relationships between such elements as A ; B,aad 
B, it is a matter of considerable practical importance that in general 
conventions based on ordering will require the symbols to be arranged 
in a different sequence than ihat in which the corresponding elements 
appear in the document being analyzed. Tins, to cite a simple example, 
if the idea to be encoded is s "nan bites dog", then this simple re- 
lationship may be expressed in the document by the sentence, "a dog 



Approved For Release 2O0G/Oa/25H€iA!iROP57-OOO42AOOO2OO1 5001 5-6 



Approved For Release 2000/0^25^CIA-RDP57-00042A0002001 5001 5-6 

Security Information 


was bitten by a man" , Rearranging this simple sentence is, of course, 
relatively easy. But as the coding conventions relating to ordered 
sequence become more oomplex - as is inevitable when handling more 
complex interactions - then following these conventions when encoding 
will almost certainly be difficult and time-consuming, 

f 

As is evident, many difficulties and complexities are encounter- 
ed when the attempt is made to use order of citation of index entries 
as the sole means for resolving ambiguities caused by asymmetric re- 
lations, For this reason, it appears worth while to investigate other 
possibilities of taking asymmetric relations into account when con- 
ducting searches. 

By taking a slightly different approach it is possible to use a 
predetermined order of citation as the basic means for resolving 
ambiguity involving asymmetry and also to avoid some of the difficult- 
ies mentioned in the preceding paragraphs. This approach might be 
described - in order to make it more readily understandable - as 
consisting of two steps, The first step requires that the index 
entries - perhaps most conveniently in encoded form - be arranged in 
the order which has been established as standard for resolving the 
inherent ambiguity of the asymmetric relationship. The second step 
is to attach numbers in the usual arithmetical order to the success- 
ive indax entries Cor to their encoded designations) in the ordered 
array. Actually in performing the encoding operation the first step 
would not be absolutely necessary. It would be necessary to keep in 
mind, however, which place in the array would be occupied by a given 
entry - or its encoded designation - even t hough the array itself were 
perhaps not set up in explicit form. In other words, it might suffice- 
particularly in simple cases - to have in mind which position a 
given code designation would occupy in an array and then assign it 
the appropriate nunber without going to the trouble of actually writ- 
ing out the array itself. 

This approach is closely akin to the concept of "argument posit- 
ions" of formal logic. When a given relationship word, for example 
a transitive verb, requires citation of two other entities to make 
a complete statement concerning the action, then the relationship 
word is said to have two arguments. Ou.r example of A R r s B involves 
a relation R having two arguments A and B„ Our example of A giving B 
to C involves a relation - namely giving - having three arguments. 

If the approach under consideration were adopted, the oode dict- 
ionary would be built up in such a waj that entering a relationship 
term in the dictionary would involve not only its code designation, 
but also specification of the numerical indexes to be attached to the 
respective arguments. 
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It is instructive to consider how this approach may b# applied 
to examples already discussed. Thus if the sequencing order ARB 
is taken as the basis for expressing that A stands in relation R to 
B, then the roles of A R and B in that relationship might he symbol- 
ized by attaching the prefix 1 to A, the prefix 2 to R a.nd the prefix 
3 to B„ These new composite symbols may now be cited in any order with 
out any fear of ambiguity as a result. Thus we might indicate that A 
stands in relation R to B by symbolic arrays of which lA 2R 3® is one 
example. Another is 2R lA 3® 5 another is 3® lA 2R, etc. 

If this approach is applied to our example of A and B reacting 
to form C and D we might base the numbering of the elements on the 
sequence A B fi C D in which case - after applying the index numerals- 
we would have lA 2B 3R 4C 5D. Similarly if E gives F and G to H we 
might take the sequence E R F G H as the basis and apply the index 
numerals to arrive at the symbolic representation, 1® 2R 3 y 4G 5 H » 
Certain disadvantages inherent in this approach become evident on 
considering these examples, First, the symbolic representation lA 
2B 3R 4C 5D attaches different numbers to A and B, even though those 
symbols both represent initial reacting substances, while the substan- 
ces formed are also represented by symbols to which different numerical 
indexes are attached. In the other example, 1® 2R 3F 4G two 
different numeral indexes are attached to F and G even though both 
were given to H, 

The first step toward improving this approach is to specify that 
the same role indicator shall be used with all entities that have 
same role, From the viewpoint of theoretical logic, this would mean 
that numerical indexes are used to indicate the argument type rather 
than the position of the symbol in a standardized array set up for the 
purpose of providing a basis for resolving ambiguity due to some 
asymmetric relation. If this is done, the symbolic representation 
of A and B reacting to form C and D might become* 

1A IB 2R 3° 3 d 

Similarly the representation of E giving F and G to H might take into 
account the identify of the roles of F and G by the notations 

IE 2R 2F 2G 3H 

If the symbolism for indicating roles were set up in this way, comp- 
ilation of the code dictionary would require each dictionary entry 
denoting a relation to specify the appropriate numerical indexes to 
be used to indicate the various roles associated with a given re- 
lation (be it "reacting", "giving", etc.). 

These examples also point the way to a possibility for simplify- 
ing the use of numerical indexes — or similar symbols — to indicate 


- 6 - 

Approved For Release 200WtWfjWWft\-RDP57-00042A0002001 5001 5-6 

Security Information 


Approved For Release 2000/08/25 : CIA-RDP57-00042A0002001 5001 5-6 


Security Information 


the diffeernt roles associated with various relations. 


As already noted* earlier in this chapter, our analysis of 
asymmetric relations can "be related to the "argument positicnc" 
concept of formal logic, This concept permits us to use a general 
symbolic designation to embrace a wide range of relationships. Thus 
K was used to designate four different relations while A and B denot' 
ed four different types of entities, as follows* 


A 

Substance A 
Country A 
Company A 
Material A 


E's 

dissolves 

attacks 

is subsidiary to 
is harder than 


B 

* substance B 
country B 
company B 
material B 


The index 1, 2 and 3 might then be attached to A, R and B f respect- 
ively, when confronted with any one of the four relations denoted 
by the general symbolism A E ! sB, 


Generalizing from this example, we can - if we deem it approp- 
riate - group together and express by generalized symbolism relations 
which have the common feature of having the seme number of logical 
arguments. The same role indicators can then be used with any set 
of relations. 


It is perhaps obvious that we are under no compulsion to group 
together all relations which have the same number of logical arguments* 
One of the problems of code construction is to arrive at decisions 
as to how groupings of relations can be set up to best advantage 
so as to keep the system of role indicators as simple as possible and 
yet provide the type of discrimination effective In selecting needed 
information. 


If role indicators are set up on the basis of groupings of re- 
lations characterized by the same number of logioal arguments, it 
would scarcely be possible to ascribe any specific meaning to the 
role indicators which in fact do no more than provide means for resolv- 
ing ambiguity arising from asymmetric relations. The possibility 
exists, however, to ascribe definable meaning to the role indicators. 
Thus in chemistry we might use the symbol " s" to denote a starting 
material and "p" to denote a reaction product* Thus we might symbol- 
ize the reaction of A and 3 to produce 0 and D by* 

sA b 3 R pC pD 

where B denotes chemical reaction. If it seemed appropriate we 
might generalize the symbol "s" to denote a wider range of entities, 
including, for example, plants used for hybridizing, while "p" might 
similarly be generalized to include entities produced, including, for 
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example, the result cf hybridization. We might denote the hybridization 
of plants A and B to produce the hybrid C by the symbolism: 

sA s_13 H pC 

where H denotes hybridization. If the new variety 0 of a given species 
were obtained by other means — e.g., as a spontaneous nutation, or by 
plant selection- — then the newly obtained plant would, by this approach, 
also be denoted by pC. A machine search directed to pC would, then lo- 
cate all those documents in which the plant variety C had been produced 
regardless of the means employed to establish the new variety. Such a 
search would exclude documents in which variety C had been used in some 
other role, e.g., parent plant used for hybridizing. 

So far discussion in this appendix has centered on asymmetric 
relations and means for resolving ambiguities associated with them. 

In terms cf formal logic we have been concerned with relations and 
their arguments, especially interactions and the entities directly 
concerned. Circumstances surrounding an interaction have been left 
out of consideration. Thus, for example, in speaking of chemical, 
reactions no mention was made of temperature, pressure, inert sol- 
vents, catalysts and the like. 
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